Mediators of the association between educational attainment and type 2 diabetes mellitus: a two-step multivariable Mendelian randomisation study

Aims/hypothesis Type 2 diabetes mellitus is a major health burden disproportionately affecting those with lower educational attainment (EA). We aimed to obtain causal estimates of the association between EA and type 2 diabetes and to quantify mediating effects of known modifiable risk factors. Methods We applied two-step, two-sample multivariable Mendelian randomisation (MR) techniques using SNPs as genetic instruments for exposure and mediators, thereby minimising bias due to confounding and reverse causation. We leveraged summary data on genome-wide association studies for EA, proposed mediators (i.e. BMI, blood pressure, smoking, television watching) and type 2 diabetes. The total effect of EA on type 2 diabetes was decomposed into a direct effect and indirect effects through multiple mediators. Additionally, traditional mediation analysis was performed in a subset of the National Health and Nutrition Examination Survey 2013–2014. Results EA was inversely associated with type 2 diabetes (OR 0.53 for each 4.2 years of schooling; 95% CI 0.49, 0.56). Individually, the largest contributors were BMI (51.18% mediation; 95% CI 46.39%, 55.98%) and television watching (50.79% mediation; 95% CI 19.42%, 82.15%). Combined, the mediators explained 83.93% (95% CI 70.51%, 96.78%) of the EA–type 2 diabetes association. Traditional analysis yielded smaller effects but showed consistent direction and priority ranking of mediators. Conclusions/interpretation These results support a potentially causal protective effect of EA against type 2 diabetes, with considerable mediation by a number of modifiable risk factors. Interventions on these factors thus have the potential of substantially reducing the burden of type 2 diabetes attributable to low EA. Graphical abstract Supplementary Information The online version contains peer-reviewed but unedited supplementary material available at 10.1007/s00125-022-05705-6.


ESM Methods 1. Mendelian randomisation assumptions and methods
Causal inference in traditional observational epidemiological studies is limited due to confounding and reverse causation. MR is a method that can be used to uncover causal relationships between exposure and outcome in the presence of such limitations [1]. MR uses SNPs to genetically predict exposures. MR estimates are unconfounded, and thus valid estimates of causality, under a number of key assumptions.
The first is the relevance assumption, which assumes that the genetic instruments are strongly associated with the exposure; this assumption is satisfied by the selection of SNPs with robust genome-wide significant (P<5x10 -8 ) and replicated associations. The second is the independence assumption, which requires that genetic instruments are not associated with any confounder of the relationship between exposure and outcome; this is assumed to be true due to Mendel's law of independent segregation, in which genetic variants for a certain trait are inherited independently of other traits. The third is the exclusion restriction criterion, which requires that any effect of the genetic instrument on the outcome variable is solely through the exposure variable. This assumption might be violated due to horizontal pleiotropy, in which a genetic instrument might have direct effects on both the exposure and the outcome. To assess robustness of our results against horizontal pleiotropy, we performed sensitivity analyses. Firstly, we assessed the heterogeneity of Wald ratios (i.e. single SNP MR effect estimates) to find evidence of potential pleiotropy; large heterogeneity in Wald ratios is suggestive of horizontal pleiotropy (Supp Table S1). Second, we examined MR funnel plots of Wald ratios; assymetry in the funnel plots is suggestive of directional horizontal pleiotropy (data not shown). Third, we examined the Egger intercept; significant deviation from a zero intercept is suggestive of directional horizontal pleitropy (Supp Table S2). Finally, we conducted two sensitivity MR analyses that relax the exclusion restriction criterion: a) Mendelian randomisation-Egger (MR-Egger) [2,3], which allows for estimation of causal effects in the presence of directional horizontal pleitropy, but assumes that the SNP strength is independent of the direct SNP effect on the outcome (InSIDE 3 assumption [2]); and b) weighted median MR [4], which is based on the median Wald estimate and allows for consistent estimation even when up to 50% of the information comes from invalid, pleiotropic SNPs.
Additional assumptions for the two-sample setting of the present study include that sample should represent the same underlying population (in the present study, a European ancestry population) with minimal sample overlap between studies. 4

ESM Methods 2. Observational mediation analysis in NHANES 2013-2014
We used publicly available data from NHANES 2013-2014. Of this survey, a subset was selected of 1912 non-Hispanic white participants aged 30 years or older, with complete data on age, sex, BMI, hours of TV watching, smoking, SBP and DBP. Educational level was categorized according to ISCED and then assigned a value of years of schooling. Years of schooling was then rescaled so that one unit increase represented 4.2 years of schooling. Participants were categorized as diabetic in case of a HbA1c ≥ 6.5%, fasting plasma glucose ≥ 7 mmol/L, or a 2 hour plasma glucose ≥ 7.8 mmol/L after an oral glucose tolerance test. BMI was calculated by dividing weight (in kg) by height squared (m 2 ). Television (TV) watching was an ordinal variable but was analyzed as it were a numerical (ranging 0-5 hours) to reflect the MR analysis. For practical and computational reasons, smoking was considered a numerical variable, recoded into ever versus never smoking. SBP and DBP were averaged from multiple repeat blood pressure measurements according to recommendations. Physical activity, as measured by accelerometry, was available in the NHANES 2013-2014 survey, but was left out of the analysis for consistency with the final MR analyses.
We accounted for the complex sampling strategy in NHANES by applying the recommended weighting procedure using the survey R-package [5]. We then used the svyglm() function implemented in the survey R-package for multivariable regression modeling of exposure-mediator-outcome associations. In single mediator models, indirect effects through the mediator of interest were estimated using the product of coefficients method, where coefficients were obtained from linear and logistic regression where appropriate. Standard errors were approximated using the delta method. To estimate proportion mediated by multiple mediators, we applied the difference method. All observational analyses were performed using R version 4.03 software [6].   Single mediator analysis in NHANES 2013-2014. a represents an estimate of the association between EA and mediator, per 4.2 years of schooling in a linear regression model b represents an estimate of the association between mediator and T2D, conditional on EA in a logistic regression model c' represents an estimate of the direct effect between EA and T2D, conditional on the mediator in a logistic regression model a*b represents an estimate of the indirect effect of the mediator on T2D c=a*b+c' represents an estimate of the total effect of EA on T2D proportion mediated is calculated by a*b / c standard errors (se) were calculated using the delta method. Abbreviations: BMI, body-mass index; DBP, diastolic blood pressure; EA, educational attainment; SBP, systolic blood pressure; T2D, type 2 diabetes; TV, television watching.

ESM Figure 1. Directed acyclic graph for 2-step Mendelian randomisation analysis
Causal directed acyclic graph illustrating the effect of X on Y through mediation by Z. X is the exposure (education attainment); Y is the outcome (type 2 diabetes) ; Z represents mediators (BMI, SBP, DBP, smoking, TV watching); GX and GZ are genetic instruments of X and Z, respectively.